12 - Artificial Intelligence I [ID:54619]
50 von 845 angezeigt

Okay, let's start.

Obviously winter is coming.

If you look around.

Yeah, one more thing.

We talked about Monte Carlo Tree Search, which is a method designed to come up with the

difficulty of finding evaluation functions.

Remember we have these search trees.

Those are much too big to explore.

And at the same time, we only get information about how well we're doing at the terminal

nodes.

Okay?

So Minimax kind of has the fiction.

You explore the whole tree, get at the data of your reward, propagate it all up.

And that doesn't work because the search trees are too big.

So what we do is we kind of, like a heuristic, make up values here.

How good does a certain state feel?

And of course, just like heuristics, it's difficult.

Having a good one gives us a good search.

Having a bad one does the opposite.

And so instead of Monte Carlo Tree Search, instead of inventing a heuristic, which we

call an evaluation function here, you must basically sample randomly a couple of these

and then kind of use the average result or something like this as an evaluation function.

Okay, how does that work?

Oh yes, and we briefly talked about AlphaGo because Monte Carlo Tree Search plus neural

networks learning how to work with the algorithm is actually a good solution, kind of completely

out of the blue solved the game of Go, which was by many considered unsolvable before.

Okay, so the idea is we're going to sample and what we're doing is we're creating a Monte

Carlo Tree, which rather than looking at this algorithm description, let's look at an example.

So we start with a root node.

The new stuff of course is that we're going to, with every node, associate a Monte Carlo

Tree Search record, which looks at how much we've expanded and what the average reward

is for every of the applicable actions.

Okay, so that's easy, that's just initializing the tree with the initial node.

So we decide for one randomly and then another one and another one and we do that all the

way down until we hit a goal node.

Remember goal node, those are the ones where you win or lose a draw.

And say we're getting a reward of 10 there, then we kind of completely wipe out this thing.

Say we've expanded the middle action one time and got an average reward of 10 here.

Jaws bookkeeping for the sampling.

And then we do it again, we decide on the left one and we go all the way down and say

we hit a 70, we remove everything, we do the bookkeeping and then we sample some more randomly.

And you can see that the games may have different lengths and so on.

And we keep on sampling and sampling, doing the bookkeeping and we acquire basically an

evaluation function.

Something that tells us how good the different actions actually smell.

And we sample some more randomly.

And then at some point we act on the action.

At this point here we've decided, oh 60 looks better than 55 or even 35.

And so we're going to exploit the sampling information and make an action.

And that was it because now we basically repeat this.

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:30:19 Min

Aufnahmedatum

2024-11-21

Hochgeladen am

2024-11-21 18:59:03

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen